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Abstract 

Given a random sample of observations, mixtures of normal densities are often used 
to estimate the unknown continuous distribution from which the data come. Here 
we propose the use of this semiparametric framework for testing symmetry about an 
unknown value. More precisely, we show how the null hypothesis of symmetry may 
be formulated in terms of normal mixture model, with weights about the centre of 
symmetry constrained to be equal one another. The resulting model is nested in a 
more general unconstrained one, with same number of mixture components and free 
weights. Therefore, after having maximised the constrained and unconstrained log- 
likelihoods by means of a suitable algorithm, such as the Expectation-Maximisation, 
symmetry is tested against skewness through a likelihood ratio statistic. The per- 
formance of the proposed mixture-based test is illustrated through a Monte Carlo 
simulation study, where we compare two versions of the test, based on different cri- 
teria to select the number of mixture components, with the traditional one based on 
the third standardised moment. An illustrative example is also given that focuses 
on real data. 
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1 Introduction 



Let Xi,X2, ■ ■ ■ ,X n be a random sample from a continuous distribution F(x) with den- 
sity f(x). A problem which may be useful to consider is the symmetry of f(x) about 
some unknown value. Indeed, nonparametric methods assume the symmetry of the dis- 
tribution rather than its normality and, moreover, many parametric statistical methods 
are robust to the violation of the normality assumption of f(x), being the symmetry of- 
ten sufficient for their validity. For instance, in the context of regression models, Bickel 
(1982) shows that, if the conditional density of errors is symmetric about zero, then the 
regression coefficients may be estimated in an adaptive way. Knowledge about the sym- 
metry of f(x) is also relevant to choose which location parameter is more representative 
of the distribution, being mean, median, and mode not coincident in case of skewness. 
Another situation in which testing for symmetry may be important is encountered in case- 
control studies, which require the exchangeability of the joint distribution of observations 
of treated and controlled individuals. As exchangeability implies the symmetry of the 
distribution, knowing that a distribution is skewed allows to exclude its exchangeability 
(Hollander, 1988). 

By indicating with // the mean or the median of /(.), the problem of testing symmetry 
may be formulated as 

H : F{fi - x) = 1 - F{fi + x) Vx 

against the alternative hypothesis of skewness 

#i : F{fi -x)^l- F{fj, + x) 

for at least one x. Several procedures have been proposed in the literature to solve this 
testing problem (for a review see Hollander (2006)) and they can be classified on the basis 
of the used skewness measurement. 

The most known skewness index is given by the third standardised moment 71, usu- 
ally estimated by the corresponding sample moment b±. Gupta (1967) proves that, pro- 
vided that F(x) has finite central moments up to order six, b\ is asymptotically normally 
distributed with a variance well defined. By estimating this variance through the corre- 
sponding sample moments, an asymptotically distribution-free test is obtained. Another 
test based on b\ is presented by D'Agostino (1970), who proposes a suitable transfor- 
mation of b\ having standard normal distribution already for small sample sizes (see also 
D'Agostino and Pearson (1973) and D'Agostino et al. (1990) for more details). This test is 
quite popular being one of the few symmetry tests implemented in widespread statistical 
packages, such as Stata. However, the main drawback is that it assumes the normality 
of F{x) under the null hypothesis, so ignoring the possible presence of excess or defect of 
kurtosis in the distribution. 

Although 7x is a traditional measure of skewness, it is not free of drawbacks: it is sen- 
sitive to outliers and it can even be undefined for heavy-tailed distributions such as the 
Cauchy; moreover, although it is equal to zero for symmetric distributions, a value of zero 
does not necessarily mean that the distribution is symmetric (Ord, 1968; Johnson and 
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Kotz, 1970). Therefore, other symmetry tests have been developed which are based on 
alternative measures of skewness, such as those proposed by Yule (1911) and Bonferroni 
(1933), that take into account the difference between mean and median of population. 
Cabilio and Masaro (1996) propose a test based on an asymptotically normally distributed 
estimator of the Yule's skewness index. They obtain an asymptotic distribution-free test 
by using the asymptotic variance derived under the normality assumption, discussing 
that the misspecification effect is negligible for the main part of practical problems. More 
recently, Miao et al. (2006) modify the procedure of Cabilio and Masaro (1996), by sub- 
stituting the sample standard deviation with a function of differences (in absolute value) 
between each observation and the sample median, and find a test that is generally more 
powerful. Finally, Mira (1999) proposes a test based on an estimator of the Bonferroni's 
index and provides consistent estimate for the variance of the test-statistic. 

Apart from the previously mentioned tests, several others exist which are based on 
different skewness measures. One of the most well-known is a nonparametric test proposed 
by Gupta (1967), based on the concept of stochastic dominance, in which the test statistic 
is given by the difference between the number of positive and negative (in absolute value) 
deviations from median. Instead, Randies et al. (1980) propose a triples test, in which 
observation triples are considered and the presence of skewness is assessed by a suitable 
function of the difference between the number of right triples (i.e., when the middle 
observation in a given triple is closer to the smallest one) and that of left triples (i.e., 
when the middle observation is closer to the largest one). Another interesting class of 
tests is represented by the runs tests (McWilliams, 1990; Modarres and Gastwirth, 1996): 
after having ordered the observations according to the absolute value and retaining signs, 
the number of changes of sign (so called runs) in the sequence gives an indication about 
the symmetry of the distribution. 

The problem of testing symmetry has also received considerable attention in these last 
years. Without pretending to be exhaustive, we remind the test of symmetry of Holgersson 
(2010) based on a skewness index that combines the third standardised moment with a 
suitable function of the difference between mean and median; the proposal of Zheng and 
Gastwirth (2010) of using the bootstrap method to estimate, in presence of small sample 
sizes, the distribution of test-statistic for some known tests; the solution of Abd-Elfattah 
and Butler (2011) for testing symmetry in presence of right censure. We also remind the 
works of Ley and Paindaveine (2009) and Cassart et al. (2011) for the analysis of local 
optimality properties of some parametric, semi- and non-parametric tests, showing that 
the traditional test based on b\ is optimal in proximity of normal distributions. 

Finally, we observe that some part of the recent literature has addressed the problem 
of testing symmetry by adopting the nonparametric kernel estimation method. Among 
others, Fan and Gencay (1995) propose, in the context of linear regression, a test for 
symmetric error distribution based on the kernel estimation of the density function of 
the errors; Ngatchou-Wandii (2006) illustrates several tests based on kernel estimators of 
skewness measures alternative to the traditional ones; Racine and Maasoumi (2007) de- 
scribe a kernel-based test that uses a metric entropy statistic. The use of kernel estimators 
is an interesting one, because, being a nonparametric method, it allows a better good- 
ness of fit with respect to parametric methods; on the other hand, it suffers from a high 
number of unknown parameters. An alternative approach to overcome this drawback and 
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to which we focus in this contribution is represented by the normal finite mixture (NM) 
models (Titterington et al., 1985; Lindsay, 1996; McLachlan and Peel, 2000). Because any 
continuous - symmetric or skewed - distribution can be approximated arbitrarily well by a 
finite mixture of normal densities with common variance (Ferguson, 1983), NM provide a 
convenient semiparametric framework in which to model unknown distributional shapes, 
by keeping (i) a parsimony close to that of full parametric methods as represented by 
a single density and (ii) the flexibility of nonparametric methods as represented by the 
kernel method (Escobar and West, 1995; Robert, 1996; Roeder and Wasserman, 1997). 

Aim of the present paper is to propose the use of NM for testing symmetry of a dis- 
tribution about an unknown value. Indeed, already the first studies of Pearson (1894) 
outline how a skewed distribution can be well described through a mixture of two normal 
densities. The general idea is that if the sample observations come from a symmetric 
distribution, then the weights of mixture components equidistant from the centre of sym- 
metry are equal, being different otherwise. Therefore, we show how the above mentioned 
null hypothesis Hq of symmetry can be alternatively expressed in terms of constraints on 
the weights. We also outline how a critical point in the proposed testing procedure con- 
cerns the choice of the number of mixture components: in particular, we base our choice 
on some well-known and commonly accepted information criteria, following McLachlan 
and Peel (2000). Then, to decide whether to reject or not the null hypothesis, we illus- 
trate that a likelihood ratio test may be performed by comparing, as usual, the maximum 
unconstrained log-likelihood of the model with the maximum constrained log-likelihood 
(i.e., under H ). More precisely, we show how to compute the log-likelihood function 
of the NM used for testing symmetry and we also show how to maximise it through an 
Expectation-Maximisation (EM) algorithm. 

The performance of the proposed approach is illustrated through a Monte Carlo sim- 
ulation study that compares our proposed test with the traditional test of Gupta (1967), 
which is based on the third sample standardised moment. Our test is evaluated by se- 
lecting the optimal number of mixture components by using both Akaike's and Bayesian 
Information Criteria (AIC and BIC, respectively). Finally, an application to real data is 
illustrated. 

The paper is organised as follows. In Section 2 we describe the main characteristics 
of an NM model and we illustrate the EM algorithm implemented to maximise the log- 
likelihood of the model. Moreover, the proposed test of symmetry is formulated in terms 
of constraints on the weights. The main results of the simulation study are shown in 
Section 3, where our test is compared with the traditional test based of the third sample 
standardised moment, whereas in Section 4 we describe the application of our proposed 
test to real data. Finally, some remarks conclude our work. 

2 The mixture-based test of symmetry 

In the following, we describe the main characteristics of the NM model on which our 
test of symmetry is based and we give some indications about finding the maximum 
log-likelihood through an EM algorithm. Finally, we formulate the null hypothesis of 
symmetry as constraints about weights of NM model and we describe how to verify it by 
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a likelihood ratio test. 

2.1 Mixture model 

Let k be the number of normal components of the mixture, let a be the centre of the 
symmetry, and let (3 be a scale parameter such that the support points of the mixture are 

u j = a + p8 j , j = l,...,k, 

where Si, . . . , 5k is a grid of equispaced points between —1 and 1. Therefore, the density 
of a mixture of k normal components (NM&) results defined as 

k 

where (f>(x; Vj, a 2 ) denotes the density at x of the distribution N(vj, a 2 ). 
The model log-likelihood is given by 

n k 
i=l j=l 

where the parameters with respect to which it has to be maximised are 6 = (a, (3, 7Ti, . . . , 7Tfc). 
To compute these estimates, we can make use of the well-known EM algorithm of Demp- 
ster et al. (1977), which is described in detail in the following section. 

2.2 EM algorithm 

To introduce this algorithm consider the complete data log-likelihood 

n k 

i=l j=i j 

where dummy variable equal to 1 if the i-ih observation belongs to the j-th 

component and to otherwise and z.j = J2i z ij- The EM algorithm is based on the 
following two steps, to be performed until convergence: 

(E) compute the expected value of z^, i — 1, . . . , n and j — 1, . . . , k, given the observed 
data x = (xi, . . . ,x n ) and the current value of the parameters 6; in practice this 
expected value is computed as 

_ ^[x i ;v 3 ,a 2 )'Kj 
13 Eh^i^/n^W 
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(M) maximise i c {6) with any Zij substituted by The derivatives of £ c (0) with respect 
to a and (3 are, respectively, 



d£ c {0) 




(9a 



dim 



EE* 




9/3 



So, after some algebra, we can easily see that the solution is reached when: 



a 



a 



2 



E* Ej - x)8j 

x-06, 

E* Ej^yki- (« + Wl 2 



where x = Ej a; «/ n an d ^ 
parameters 7Tj's is simply 



V- z.jSj/k. The maximisation with respect to the 



TTj = — , J = 1, - - - , fc. 



(1) 



A crucial point with NM models concerns the choice of the number k of mixture 
components. When the main aim of adopting an NM model is to use a semiparametric 
framework for density estimation, as in our case, rather than the clustering of observations, 
McLachlan and Peel (2000) [Chap. 6] (see also references cited therein) discuss that the 
well-known AIC (Akaike, 1973) and BIC (Schwarz, 1978) indices present an adequate 
performance for choosing k. Coherently, we suggest to use these criteria, although they 
may lead to different choices of k. More precisely, AIC tends to overestimate the true 
number of components. Moreover, we only select k as an odd number, so that there is one 
mixture component, the [(fc + l)/2]-th, which corresponds to the centre of the distribution 
and its mean directly corresponds to the parameter a. 

2.3 Proposed test of symmetry 

In the proposed NM framework, the hypothesis of symmetry may be formulated as 



where [z] is the largest integer less or equal than z and k is fixed. In other words, in a 
symmetric density the components specular with respect to the centre of symmetry are 
represented in equal proportions, whereas in a skewed density they are mixed in different 
proportions. 



H : Tij = n k -j+i, j = l,..., [k/2], 
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We observe that the NM fc model under H is nested in the NM fc model with uncon- 
strained iTj. Therefore, for testing symmetry we may use a likelihood ratio test, based on 
the deviance 

dev = 2[£{d) - e(0 )}, 

where is the unconstrained maximum likelihood estimator of 6 and 6 is that under the 
constraint H , obtained according to the above described EM algorithm. We note that, 
under H , the estimator itj in equation (1) becomes 

_ z.j + i fc - i+ i 

2n , j — 1, ... , As- 

Under H , dev is distributed as a Chi-square with a number of degrees of freedom 
equal to [k/2], that is the number of constrained weights. We observe that when k — 1 
is selected the NM degenerates to a single normal distribution and, therefore, the null 
hypothesis of symmetry results automatically accepted. 

Note that, when Hq is true, k reflects the true number of latent groups in which the 
population units are clustered. Instead, in presence of a skewed density, k depends both 
on the groups characterising the population and on the level of skewness, because more 
than one normal component is usually needed to model a skewed distribution. Therefore, 
there is not any more a one-to-one correspondence between the mixture components and 
the groups. 



3 Monte Carlo study 

This section summarises the results of a Monte Carlo study which shows the performance 
of the proposed test of symmetry. Two versions based on AIC and BIC are compared 
with the traditional test based on the third sample standardised moment 

, m 3 
m{ 

where m r is the sample central moment of order r, given by m r = l/n^2™ =1 (xi — x) r . 

As outlined in the Introduction, b\ is commonly used to estimate the third standardised 
population moment 

_ A*3 



with fi r = E\(X — /i) r ]. For samples from a symmetric distribution with finite sixth order 
central moment, Gupta (1967) shows that b\ is asymptotically normally distributed with 
mean equal to and variance equal to 

2 A*6 - 6/i2/x 4 + 9/4 
a = 3 , 
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which may be consistently estimated by substituting /ij, j = 2,4,6, with the appropriate 
sample moments. Therefore, under the null hypothesis of symmetry, the test-statistic 



Si 



n 



has asymptotic standard normal distribution. 

Within the Monte Carlo study we simulated 1000 samples with increasing size (n = 
20,50,100) from the following distributions: a standard normal (iV(0, 1)), a Student's 
t with 5 degrees of freedom (£5), a Laplace or double exponential (Lap), a symmetric 
mixture of three normal distributions (NM 3 ), a Chi-square with 1 degree of freedom (%?), 
a Chi-square with 5 degrees of freedom (x\), a Chi-square with 10 degrees of freedom (xio)> 
and a lognormal with mean and variance equal to 1 (logN). All tests are performed for 
nominal levels a equal to 0.01, 0.05, 0.10. All analyses are implemented in R software. 

The comparison among the two versions of the proposed mixture-based test and the 
Gupta's Si-based test is performed by taking into account the following optimality criteria 
for a good test: (i) the empirical type-I error probability must not be higher than the 
nominal significance level for distributions satisfying the null hypothesis of symmetry and 
(ii) the empirical power for skewed alternatives must be as better as possible. Both of 
these informations are included for the simulated data in Tables 1 and 2, respectively. 

As concerns the empirical significance level (Table 1), the mixture-based test shows 
a performance very similar to that of Gupta's test when the number k of components is 
selected by means of BIG On the contrary, when AIC is used for the model selection, 
an empirical level is observed constantly higher than the nominal one: in other words, 
the type-I error is committed too often. This may be explained through results in Table 
3, showing the empirical percentage frequencies distributions of optimal k values selected 
according to AIC or BIG In case of data from symmetric distributions, the k value should 
coincide with the actual number of groups, that is one for the first three cases and three 
for the NM3. However, as we can observe from Table 3, the AIC method overestimates k 
more often than the BIC method. 

On the other hand, this tendency of the AIC method to choose a relatively high 
number of mixture components results in a good performance of the mixture-based test 
for skewed distributions. In this case, the empirical power is clearly better with respect to 
the variant using the BIC method and, most of all, to the Gupta's test (Table 2). We also 
observe that the variant of mixture-based test using BIC is almost always more powerful 
than Gupta's test. The only exception is observed in correspondence of data generated 
from xlo with n = 100: this is the case closest to symmetry among those considered in 
our study and, as shown in Table 4, both AIC and BIC methods tend to select a small 
number of mixture components. For all the three types of test, we observe that, as the 
sample size increases, the empirical significance level remains constant and the empirical 
power increases. 

In conclusion, the proposed test based on BIC method has a performance better or, 
at worst, similar to that of the traditional test of symmetry. 
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Table 1: Empirical significance levels of the mixture test based on AIC, mixture test based 
on BIC, and Gupta's test at levels of significance 0.01,0.05,0.10, based on 1000 simulated 
sample of size n = 20, 50, 100 from certain symmetric distributions. 



4 Empirical example 

In this section we illustrate the results obtained by testing the hypothesis of symmetry 
for of a set of n = 40 observations about tomato roots, whose histogram is represented 
in Figure 1. These data have been analysed through an NM model by Gutierrez et al. 
(1995) to identify the number of physical phenomena underlying the process of later root 
initiation. The authors showed, through an approach based on the Rox-Cox transforma- 
tion, that the use of an NM 2 to adequately fit the data is due to the skewness of the data 
rather than to the presence of two physical phenomena behind the process at issue. Here 
we verify if their conclusions about skewness of data are confirmed by our test. 

We first select the optimal number of mixture components by means of AIC and 
RIC. As shown in Table 5, for the general model with unconstrained weights, AIC index 
detects k — 5 normal components, to which corresponds a log-likelihood equal to —37.646, 
whereas RIC index is more parsimonious and suggests to use k — 3 components, to 
which corresponds a log-likelihood equal to —40.554. The corresponding AIC and RIC 
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Table 2: Empirical power levels of the mixture test based on AIC, mixture test based on 
BIC, and Gupta's test at levels of significance 0.01,0.05,0.10, based on 1000 simulated 
sample of size n = 20, 50, 100 from certain skewed distributions. 
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Figure 1: Histogram of tomato roots data. 
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n = 100 








1 


85.5 


99.4 


40.1 


73.8 


10.7 


51.1 


0.0 


0.0 


3 


11.4 


0.6 


46.3 


25.2 


53.8 


44.0 


89.0 


98.8 


5 


2.4 


0.0 


11.3 


1.0 


25.9 


4.7 


9.5 


1.1 


>5 


0.7 


0.0 


2.3 


0.0 


9.6 


0.2 


1.5 


0.1 



Table 3: Percentage frequencies of k values selected by means of AIC and BIC for 1000 
samples of size n = 20, 50, 100 simulated from certain symmetric distributions. 



AIC BIC AIC BIC AIC BIC AIC BK7 

n = 20 



1 


7.0 


17.9 


46.9 


71.5 


61.3 


83.5 


10.3 


22.7 


3 


30.4 


40.2 


38.6 


24.7 


29.3 


15 


46.2 


50.5 


5 


37.5 


31.6 


13.0 


3.7 


7.9 


1.3 


30.3 


22.5 


>5 


25.1 


10.3 


1.5 


0.1 


1.5 


0.2 


13.2 


4.3 










n = 50 










1 


0.0 


1.3 


15.6 


56.3 


39.6 


80.4 


0.2 


1.4 


3 


13.5 


25.7 


40.6 


36 


39.5 


18.2 


23.5 


41.3 


5 


23.6 


35.0 


34.5 


7.6 


16.7 


1.4 


25.2 


33.1 


>5 


62.9 


38.0 


9.3 


0.1 


4.2 


0.0 


51.1 


24.2 










n = 100 








1 


0.0 


0.0 


0.8 


24.7 


12.8 


63.1 


0.0 


0.0 


3 


2.5 


7.6 


18.9 


47.9 


41.0 


32.8 


7.8 


16.4 


5 


6.3 


15.0 


47.5 


24.1 


33.8 


4.0 


10.0 


23.9 


>5 


91.2 


77.4 


32.8 


3.3 


12.4 


0.1 


82.2 


59.7 



Table 4: Percentage frequencies of k values selected by means of AIC and BIC for 1000 
samples of size n = 20, 50, 100 simulated from certain skewed distributions. 



values (89.292 and 99.552, respectively) are minimum also if we consider the case of the 
constrained model (i.e. under Hq true). In fact, in this last case the minimum AIC is given 
by 94.789 and it is obtained for k = 3, whereas the minimum BIC is equal to 101.545, 
being again observed for k = 3. 

In Table 6 results of the deviance test based on the NM 3 and NM 5 model are shown. 
For k = 5, with a p- value equal to 0.00736 the null hypothesis of symmetry or, equivalently, 
the hypothesis that i\\ = n 5 and 7r 2 = 7r 4 , is strongly rejected in favour of that of skewness 
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Ho 


false 






^0 


true 




k 


# par 


£ 


AIC 


BIC 


# par 


£ 


AIC 


BIC 


1 


2 


-47.583 


99.165 


102.543 


2 


-47.583 


99.165 


102.543 


3 


5 


-40.554 


91.108 


99.552 


4 


-43.394 


94.789 


101.545 


5 


7 


-37.646 


89.292 


101.114 


5 


-42.558 


95.116 


103.560 


7 


9 


-37.847 


93.695 


108.900 


6 


-42.757 


97.513 


107.646 



Table 5: Number of mixture components selection: number of parameters, log-likelihood, 
AIC value, and BIC value under skewness and symmetry assumptions (in bold the mini- 
mum of AIC and BIC). 



(i.e., at least one equality is not true). The same conclusion is reached by adopting k = 3 
mixture components, although the p- value is higher (0.01715). Note that the Gupta's test 
gives Si = 1.782 with p = 0.0748, so leading to not reject the symmetry hypothesis. 





k = 3 


k = 5 


deviance 


5.681 


9.823 


df 


1 


2 


p- value 


0.01715 


0.00736 



Table 6: Mixture-based test for k = 3,5: deviance, degrees of freedom, p-value. 

To conclude, analysed data may be described with a mixture of three or five normal 
components (according to the adopted model selection criterion). In both cases (Table 7) 
the main part of data is clustered in the second component (tt 2 = 0.8804, fi 2 = 2.0154 for 
k = 3 and 7r 2 = 0.7569, fi 2 = 1-8863 for k — 5), followed by the third one (tt 3 = 0.1196, 
/t3 = 3.9515 and 713 = 0.1578, jl^ = 2.9067, respectively); very low is the representa- 
tiveness of the first component. Variance is assumed to be constant over all the normal 
components, resulting equal to 0.2372 for k = 3 and to 0.1119 for k = 5. Finally, for 
k — 5, components four and five gather respectively 6.02% and 2.51% of observations 
with high average values. 
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k = 3 k = 5 

ttI 0.0000 0.0000 

7T 2 0.8804 0.7569 

vr 3 0.1196 0.1578 

vr 4 0.0602 

tt 5 0.0251 

a 2.0155 2.9067 

$ 1.9360 2.0407 

/ii 0.0794 0.8660 

/t 2 2.0154 1.8863 

// 3 3.9515 2.9067 

/t 4 3.9271 

/t 5 4.9474 

a 2 0.2372 0.1119 



Table 7: Parameter estimates under models NM 3 and NM 5 . 



Finally, in Figure 2 we show the estimated density under the constrained and un- 
constrained NM 3 models (left panel) and NM 5 models (right panel) overlapped to the 
histogram for the observed data. For both values of k, it can be clearly observed the 
better goodness of fit of the unconstrained NM model with respect to that constrained, 
allowing to take into account the positive skewness of data. 



Histogram of x 



Histogram of x 



1 * 

Q o 





NM?, 



Figure 2: Histogram of tomato roots data with the estimated density under the uncon- 
strained (dashed line) and constrained (solid line) NM 3 and NM 5 models. 



5 Concluding remarks 

After having reviewed the literature concerning the issue of testing for symmetry, in this 
contribution we outlined the existence of an interesting framework so far ignored, at least 
to our knowledge, to perform this test: that of normal mixture (NM) models. Indeed, NM 
models represent a semiparametric method to approximate unknown continuous densities 
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with a satisfying goodness of fit, most of all in presence of skewness. Therefore, they offer 
a natural setting in which to place the study of symmetry of a distribution. 

We first described the main characteristics of an NM model, illustrating in detail the 
EM algorithm implemented for parameter estimation. Then, we formulated the hypothesis 
test at issue in terms of constraints on weights characterising the NM model. Moreover, 
we describe how a likelihood ratio test is obtained, based on a test-statistic distributed 
according to a Chi-square with a number of degrees of freedom depending on the number 
of constraints and, therefore, on the number of mixture components. 

A Monte Carlo study outlined how the performance of the proposed test depends 
on the criterion used to select the number of mixture components. More precisely, we 
observed that using B1C a good empirical level of significance is obtained, comparable 
with that of the traditional test based on the third standardised moment (Gupta, 1967). 
On the other hand, the empirical power of our test with BIC resulted usually better than 
that observed with Gupta's test. 

An analysis on real data about the process of later root initiation in tomatoes illus- 
trated the application of the proposed mixture-based test. Both criteria used to select the 
number of mixture components allowed to conclude about the skewness of distribution, 
as opposed to Gupta's test, and to describe in detail the unknown underlying distribution 
from which data come. 
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